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Description 

FIELD OF THE INVENTION 

5 The present invention relates to fusion proteins and a process for preparing fusion proteins. The invention also 

pertains to various oligonucleotide and amino acid sequences which make up proteins of the present invention. 

BACKGROUND OF THE INVENTION 

10 Proteins, which in addition to the desired protein, also have an undesirable constituent or "ballast" constituent in 

the end product are referred to as fusion proteins. When proteins are prepared by genetic engineering, the intermediate 
stage of a fusion protein is utilized particularly if, in direct expression, the desired protein is decomposed relatively 
rapidly by host-endogenous proteases, causing reduced or entirely inadequate yields of the desired protein. 

The magnitude of the ballast constituent of the fusion protein is usually selected in such a manner that an insoluble 

IS fusion protein is obtained. This insolubility not only provides the desired protection against the host-endogenous pro- 
teases but also permits easy separation from the soluble cell components. It is usually accepted that the proportion of 
the desired protein In the fusion protein is relatively small, i.e. that the cell produces a relatively large quantity of "ballast". 

The preparation of fusion proteins with a short ballast constituent has been attempted. For example, a gene fusion 
was prepared which codes for a fusion protein from the first ten amino acids of p-galactosidase and somatostatin. 

20 However, it was observed that this short amino acid chain did not adequately protect the fusion protein against decom- 
position by the host-endogenous proteases (US-A 4 366 246. Column 15, Paragraph 2). 

From EP-A 0 290 005 and 0 292 763, we know of fusion proteins, the ballast constituent of which consists of a p- 
galactosidase fragment with more than 250 amino acids. These fusion proteins are insoluble, but they can easily be 
rendered soluble with urea (EP-A 0 290 005). 

25 Although fusion proteins have been described In the art, the generation of fusion proteins with desirable traits such 

as protease resistance is a laborious procedure and often results in f usbn proteins that have a number of undesirable 
characteristics. Thus, a need exists for an efficient process for producing fusion proteins with a number of attractive 
traits Including protease resistance, proper folding, and effective cleavage of the ballast from the desired protein. 

30 SUMMARY OF THE INVENTION 

The present Invention relates to a process for the preparation of fusion proteins. Fusion proteins of the present 
invention contain a desired protein and a ballast constituent. The process of the present invention involves generating 
an oligonucleotide library (mixture) coding for ballast constituents, Inserting the mixed oligonucleotide (library) into a 
35 vector so that the oligonucleotide is functionally linked to a regulatory region and to the structural gene coding for the 
said desired protein, and transforming host cells with the so-obtained vector population. Transformants are then se- 
lected which express a fusion protein in high yield. 

The process of the present invention further includes oligonucleotide coding for an amino acid or for a group of 
amino acids which allows an easy cleavage of the desired protein from the said ballast constituent. The cleavage may 
40 be enzymatic or chemical. 

The invention also pertains to an oligonucleotide designed so that it leads to an insoluble fusion protein which can 
easily be solubilized. Fusion proteins of the present invention thus fulfill the requirements established for protease 
resistance. 

Furthenmore, oligonucleotide of the present invention may be designed so that the ballast constituent does not 
45 interfere with folding of the desired protein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 and its continuation In Figure la and Figure 1 b show the construction of plasmid population (gene bank) 
so plNT4x from the known plasmid pH154/25* via plasmid plNT40. Other constructions have not been graphically pre- 
sented because they are readily apparent from the figures. 

Figure 2 Is a map of plasmid pUHIO containing the complete HMG CoA reductase gene. 
Figures 3 and 3a show construction of plK4, a plasmid containing the mini-proinsulin gene. 

SS DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to a process for the preparation of a fusion protein characterized in that a mixed oligonucle- 
otide is constructed whk^h codes for the ballast constituent of the fusion protein. The oligonucleotide mixture is intro- 
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duced in a vector in such a manner that it is functionally linked to a regulatory region and to the structural gene for the 
desired protein. Appropriate host cells are transformed with the plasmid population obtained in this manner, and the 
clones producing a high yield of coded fusion protein are selected. Advantageous embodiments of this invention are 
explained below: 

s The oligonucleotide advantageously codes at the 3'-end an amino acid or a group of amino acids which permits 

or permit easy and preferably enzymatic cleavage of the ballast constituent from the desired protein. According to 
another implementation form, an oligonucleotide is constructed that yields an insoluble fusion protein which can easily 
be made soluble. In particular, an oligonucleotide is preferably constructed which codes for a ballast constituent that 
does not disturb the folding of the desired protein. 
10 For practical reasons, the construction, according to the invention, of the oligonucleotide for the ballast constituent 

causes the latter to be very short. 

It was surprising to observe that, even when they have an extremely short ballast constituent, fusion proteins not 
only fulfill the requirements established for protease resistance, but are also produced at a high expression rate and, 
if desired, the fusion protein is insoluble, can easily be rendered soluble. In the dissolved or soluble state, the short 
IS ballast constituent according to the invention then permits a sterically favorable conformation of the desired protein so 
that it can be properly folded and easily separated from the ballast constituent. 

If the desired protein is formed in a pro-form, the ballast constituent can be constituted in such a manner that its 
cleavage can occur concomitantly with the transformation of the pro-protein into the mature protein. In insulin prepa- 
ration, for example, the ballast constituent and the C chain can be removed simultaneously, yielding a derivative of the 
20 mature insulin which can be transformed into insulin without any side reactions involving much loss. 

The short ballast constituent according to the invention is actually shorter than the usual signal sequences of 
proteins and does not disturb the folding of the desired protein. It therefore need not be eliminated prior to the final 
processing step yielding the mature protein. 

The oligonucleotide coding for the ballast constituent preferably contains the DNA sequence (coding strand) 

25 

(DCD)^ 

in which D stands for A, G or T and x is 4-12, preferably 4-8. 
30 In particular, the oligonucleotide Is characterized by the DNA sequence (coding strand) 

ATG (DCD)y (NNN)j 

35 in which N in the NNN triplet stands for identical or different nucleotides, excluding stop codons, z is 1-4 and y+z is 
6-12, preferably 6-10, wherein y is at least 4. It has proved advantageous for the oligonucleotide to have the DNA 
sequence (coding strand) 

40 ATG (DCD)5,g (NNN) 

especially if it has the DNA sequence (coding strand) 
^ ATG GCW (DCD)^.g CGW 

or, advantageously 

so ATG GCA (DCD)^.7 CGW 

in which W stands for A or T 

The above-mentioned DNA model sequences fulfill all of these requirements. Codon DCD codes for amino acids 
serine, threonine and alanine and therefore for a relatively hydrophilic protein chain. Stop codons are excluded and 
ss selection of the amino acids remains within manageable scope. The following is a particularly preferable embodiment 
of the DNA sequence for the ballast constituent, especially if the desired protein is proinsulin: 
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ATG GCW (DCD)y, ACG CGW 

or 

5 

ATG GCD (DCD)y, ACG CGT 

in which y' signifies 3 to 6, especially 4 to 6. 

10 The second codon, GCD, codes for alanine and completes the recognition sequence for the restriction enzyme 

Ncol. provided that the anterior regulation sequence ends with CC. The next to last triplet codes for threonine and, 
together with the codon CGT for arginine. represents the recognition sequence for restriction enzyme Mlul. Conse- 
quently, this oligonucleotide can be easily and unambiguously incorporated In gene constructions. 

The (NNN)z group codes in the 3* position for an amino acid or a group of amino acids that permits simple, and 

IS preferably enzymatic, separation of the ballast constituent from the subsequent protein desired. It is expedient to select 
the nucleotides in this group in such a manner that at the S'-end they code the cleavage site of a restriction enzyme 
which permits linkage of the structural gene for the desired protein. It is also advantageous for the ATG start codon 
and if necessary the first DCD triplet to be incorporated into the recognition sequence of a restriction enzyme so that 
the gene for the ballast constituent according to the invention can easily be Inserted In the usual vectors. 

20 The upper limit of z is obtained on the one hand from the desired cleavage site for (enzymatic) cleavage of the 

fusion protein obtained. I.e. it encompasses codons, for example, for the amino acid sequence lle-Glu-Gly-Arg, in case 
cleavage Is to be carried out with factor Xa. In general, the upper limit for the sum of y and z Is 12, since the ballast 
constituent should of course be as small as possible and, above all, not interfere with the folding of the desired protein. 
For reasons of expediency, bacteria or low eukaryotic cells such as yeasts are preferred as the host organism In 

2S genetic engineering processes, provided that higher organisms are not required. In these processes, the expression 
of the heterologous gene Is regulated by a homologous regulatory region, Le. one that is intrinsic to the host or com- 
patible with the host cell. If a pre-peptide is expressed, it often occurs that the pre-sequence is also heterologous to 
the host cell. In practice, this lacking "sequence harmony" frequently results in variable and unpredictable protein yields. 
Since the ballast sequence according to the Invention is adapted to its environment, the selection process according 

30 to the Invention yields a DNA construction characterized by this "sequence harmony". 

The beginning and end of the ballast constituent are set in this construction: Methionine is at the beginning, and 
an amino acid or a group of amino acids that permit the desired separation of the ballast constituent from the desired 
protein is at the end. If, for example, the desired protein is proinsulin, as NNN a triplet coding for arginine is advanta- 
geously selected as the last codon as this permits the particularly favorable simultaneous cleaving off of the ballast 

35 constituent with the removal of the C chain. Of course, the end of the ballast constituent can also be an amino acid or 
a group of amino acids which allows a chemical cleavage, e.g. methionine, so that cleavage is possible with cyanogen 
bromide or chloride. 

The intermediate amino acid sequence should be as short as possible so that folding of the desired protein is not 
affected. Moreover, this chain should be relatively hydrophilic so that solubilization is facilitated with undissolved fusion 

40 proteins and the fusion protein remains soluble. Cysteine residues are undesirable since they can interfere with the 
formation of the disulfide bridges. 

The DNA coding for the ballast constituent is synthesized in the form of a mixed oligonucleotide; it is incorporated 
in a suitable expression plasmid immediately In front of the structural gene for the desired protein andE.coli is trans- 
formed with the gene bank obtained in this manner. Appropriate gene structures can be obtained in this way by the 

45 selection of bacterial clones that produce corresponding fusion proteins. 

It was previously mentioned that the cleavage sites for the restriction enzymes at the beginning and end of the 
nucleotide sequence coding for the ballast constituent are to be regarded as examples only. Recognition sequences 
that encompass starting codon ATG and In which any nucleotides that follow may Include the codon for suitable amino 
acids are, by way of example, also those for restriction enzymes Afllll, Ndel. NIalll, NspHI or Styl. Since In the preferred 

so embodiment arginine is to be at the end of the ballast sequence and since there are six different codons for arginine. 
additional appropriate restriction enzymes can also be found here for use instead of Mlul, i.e., Nrul, Avrll, Afllll, Clal 
or Haell. 

However, it is also advantageous to use a "polymerase chain reaction" (PGR) according to Saiki, R.K. et al.. Science 
239:487-491, 1988, which can dispense with the construction of specific recognition sites for restriction enzymes. 
55 It was previously indicated that limitation to the DNA sequence (DCD)x is for reasons of expediency and that this 

does not rule out other codons such as, for example, those for glycine, proline, lysine, methionine or asparagine. 

The most efficient embodiment of this DNA sequence is obtained by selection of good producers of the fusion 
protein, i.e.. the fusion protein containing proinsulin. This yields the most favorable combination of regulation sequence, 
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7a 



IS 



ballast sequence and desired protein, as a result of which unfavorable combinations of promoter, ballast sequence 
and structural gene are avoided and good results are obtained with minimum expenditure In terms of the above-men- 
tioned "sequence harnrrany". 

Surprisingly, it was obsen/ed that the genes optimized for the ballast constrtuent according to the invention do not 
always contain the triplets preferred by E. coli. It was found that for Thr, codon ACA, which is used least frequently by 
E- coll, actually occurs frequently in the selected sequences. If, for example, the following amino acid sequence were 
optimized according to the preferred codon usage (p.c.u.) by E. coli (p.c.u.: Aota, S. et al.. Nucleic Acids Research 16 
(supplement): r315, r316, r391, r402 (1988)). we would obtain a totally different gene structure than that obtained 
according to the invention (Cf. Table 1): 



p.c.u. 



Ala 


Thr 


Thr 


Ser 


Thr 


Ala 


Thr 


Thr 


GCG 


ACC 


ACC 


AGC 


ACC 


GCG 


ACC 


ACC 


GCA 


ACA 


ACA 


TCA 


ACA 


GCA 


ACT 


ACG 



In the case of the fusion proteins with a proinsulin constituent, the initial starling point was a ballast constituent 
with 10 amino acids. The DNA sequence of the best producer then served as the base for variations in this sequence, 
whereupon it was noted that up to 3 amino acids can be eliminated without a noticeable loss in the relative expression 
rate. This finding is not only surprising, since it was unexpected that such a short ballast protein would be adequate, 
but also very advantageous since of course the relative proportion of proinsulin in the fusion protein increases as the 
ballast constituent decreases. 

The significance of the ballast constituent in the protein is apparent from the following comparison: 
Human proinsulin contains 86 amino acids. If, for a fusion protein according to EP-A 0 290 005, we take the lower limit 
of 250 amino acids for the ballast constituent, the fusion protein has 336 amino acids, only about one quarter of which 
occur in the desired protein. By comparison, a fusion protein according to the invention with only 7 amino acids in the 
ballast constituent has 93 amino acids, the proinsulin constituent amounts to 92.5%. If the desired protein has many 
more amino acids than the proinsulin, the relationship between ballast and desired protein becomes even more favo- 
rable. 

It has been mentioned on a number of occasions that as a desired protein proinsulin represents only one preferred 
embodiment of the invention. However, the invention also works with much larger fusion proteins for which a fusion 
protein with the active domain of human 3-hydroxy-3-methylglutaryl-coenzyme A-reductase (HMG) is mentioned as 
an example. This protein contains 461 amino acids. A gene coding for the latter is known e.g. from EP-A 292 803. 

Having now generally described the invention, the same will be more readily understood through reference to the 
following examples which are provided by way of illustration, and are not intended to be limiting of the present invention, 
unless specified. 



Example 1 

40 Construction of the gene bank and selection of a clone with high expression 

If not othentfise indicated, all media are prepared according to Maniatis, T; Fritsch, E. F. and Sambrook, J.: Mo- 
lecular Cloning, Gold Spring Harbor Laboratory (1 982). TP medium consists of M9CA medium but with a glucose and 
casamino acid content of 0.4% each. If not othenwise indicated, all media contain 50 ^g/ml ampicillin. Bacterial growth 

45 during fermentation is determined by measurement of the optical density of the cultures at 600 nm (OD). Percentage 
data refer to weight if no other data is reported. 

The starting material is plasmid pHI 54/25* (figure 1 ), which is known from EP-A 0 211 299 herein Incorporated by 
reference. This plasmid contains a fusion protein gene (D'-Proln) linked to a trp-promoter and a resistance gene for 
resistance against the antibiotic ampicillin (Amp). The fusion protein gene codes a fusion protein that contains a frag- 

50 ment of the trpD-protein from E. coli (D') and monkey proinsulin (Proin). The gene structure of the plasmid results in 
a polycistronic mRNA, which codes for both the fusion protein and the resistance gene product. To suppress the for- 
mation of excess resistance gene product, initially the (commercial) trp-transcription terminator sequence (trpTer) (2) 
is introduced between the two structural genes. To do so, the plasmid is opened with EcoRI and the protruding ends 
are filled in with Klenow polymerase. The resulting DNA fragment with blunt ends is linked with the terminator sequence 

55 (2) 
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5 ' AGCCCGCCTAATGAGCGGGCTTTTTTTT3 ' 

3 ' TCGGGCGGATTACTCGCCCGAAAAAAAA5 ' { 2 ) 

5 

which results in plasmid plNT12 (figure 1-(3)). 

The starting plasmid pHI 54/25* contains a cleavage site for enzyme Pvul in the Amp gene, as well as a Hindlll- 
cleavage site in the carboxyterminal area of the trpD -fragment. Both cleavage sites are therefore also contained in 
plNT12. By cutting the plasmid (Figure 1-(3)) with Pvul and Hindill, it is split into two fragments from which the one 

10 containing the proinsulln gene (figure 1 -(4)) is isolated. Plasmid pGATTP (figure la'-(5)), which is structured in an anal- 
ogous manner to (3) but which Instead of the D'-Proin gene carries a gamma-lnterferon gene (Ifn) containing restriction 
cleavage sites Ncol and Hindlll, is also cut with Pvul and Hindlll and the fragment (figure la-(6)) with the promoter 
region is isolated. By ligation of this fragment (6) with the fragment (4) obtained from (3), we acquire plasmid plNT40 
(figure la-(7)). The small fragment with the remainder of the gamma-interferon gene is cut from the tatter with Ncol and 

IS Mlul. The large fragment (figure lb-(8)) is ligated with mixed olignonucleotide (9) 



5 ' CATGGCDDCDDCDDCDDCDDCDDCDA3 ' 
20 3 ' CGHHGHHGHHGHHGHHGHHGHTGCGC5 ' ( 9 ) 

in which D stands for A, G or T and H signifies the complementary nucleotide. This results in plasmid population (gene 
bank) plNT4x (figure 1b-(10)). Mixed oligonucleotides of the present inventton may be obtained by techniques well 
25 known to those of skill in the art. 

The mixed oligonucleotide (9) Is obtained from the synthetic mixed oligonucleotide (9a) 



30 



TTCGGGTACCGHHGHHGHHGHHGHHGHHGHTGCGCAG5' 
TTGCCCATGGC3 ' (9a) 



35 which is filled in with Klenow polymerase and cut with Mlul and Nco. 

The strain E. coli WS3110 is transformed with the plasmid population (10) and the bacteria are plated on LB agar 
dishes. Six of the resulting bacterial clones are tested for their ability to produce a fusbn protein with an insulin con- 
stituent. For this purpose, overnight cultures of the clones are prepared in LB medium, and 100 |al aliquots of the 
cultures are mixed with 10.5 ml TP medium and shaken at 37°C. At OD600 = 1 the cultures are adjusted to 20 ^g/ml 

40 3-p-indolylacrylic acid (lAA), a solution of 40 mg glucose in 100 ml water is added and the preparation is shaken for 
another three hours at 37*C. Subsequently 6 OD equivalents of the culture are removed, the bacteria contained therein 
are harvested by centrifugation and resuspended in 300 \i\ test buffer (37.5 mM tris of pH 8.5, 7 M urea, 1% (w/v) SDS 
and 4% (v/v) 2-mercaptoethanol). The suspension is heated for five minutes, treated for two seconds with ultrasound 
to reduce viscosity and aliquots thereof are subsequently subjected to SDS-gel electrophoresis. With bacteria that 

45 produce fusion protein, we can expect a protein band with a molecular weight of 10,350 D. It Is evident that one of the 
clones, plNT41 (Table 1 ), produces an appropriate protein in relatively large quantities while no such protein formation 
is seen with the remaining clones. An immune blot experiment with insulin-specific antibodies confirms that the protein 
coded by plNT41 contains an insulin constituent. 

Table 1 shows the DNA and amino acid sequence of the ballast constituent for a number of plasmid constructs. 

so In particular, table 1 illustrates the DNA and amino acid sequence of the ballast constituent In the pi NT41 fusion protein. 
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Table 1 

123456789 10 11 pINT 



Met Ala Thr Thr Ser Thr Ala Thr Thr Arg 

ATG GCA ACA ACA TCA ACA GCA ACT ACQ CGT 41 



Thr Ser Thr 

*** **G A*T T*G A*G **G *** *** 42 



Ala Thr Ser Thr Ser 

**T G** *** A*T T*T A*T T*A *** *** 43 



Asn Ser - — 

*** *** AAC T*A *** *** 60 



ie-k-k 



*** *** *** *** *** **A 67d 



*** *** *** *** *** *** **A 68d 



k-kic ★** *** *** *** *** **A 69d,72d 



Gly Asn Ser Ala 
•kitk *** *** *** *** *** *G* *A* T** GCA **A 90d,91d 



Lys 

k** *** *** *** *** *** AA* **A 93d 



45 Pro 

icicic *** *** *** *** *** C** **A 94d 



Met 

k*ic *** *** *** **★ *** ATG **A 95d 
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Gly ' 

★ ** *** *** *** *** *** *G* **A 96d 

5 

Example 2 

Selection of additional clones 

To detect additional suitable clones, a method according to Helfman, D.M. et al. (Proc. Natl. Acad. Sci. USA 80: 
31-35, 1983) is used. TP-agar dishes, the medium of which contains an additional 40 |im/ml lAA, are utilized for this 
purpose. Fifteen minutes before use, the agar surface of the plates is coated with a 2-mm thick TP top agar layer, a 
nitrocellulose filter is placed on the latter and freshly transformed cells are placed on the filter. Copies are made of the 

IS filters which have grown bacteria colonies following incubation at SJ^'C, and the bacteria from the original filter are 
lysed. To accomplish this, the filters are exposed to a chloroform atmosphere in an desiccator for 15 minutes, subse- 
quently moved slowly for six hours at room temperature in immune buffer (50 mM tris of pH 7.5, 150 mW\ NaCI, 5 mM 
MgClg, and 3% (w/v) BSA), which contains an additional 1 ^g/ml DNase I and 40 pig/ml lysozyme. and then washed 
twice for five minutes in washing buffer (50 mM tris of pH 7.5 and 1 50 mM NaCI). The filters are then Incubated overnight 

20 at 3**C In immune buffer with insulin-specific antibodies, washed four times for five minutes with washing buffer, incu- 
bated for one hour in immune buffer with a protein A-horseradish peroxidase conjugate, washed again four times for 
five minutes with washing buffer and colonies that have bound antibodies are visualized with a color reaction. Clones 
pi NT42 and pi NT43, which also produce fairly large quantities of fusion protein, are found in this manner In 500 colonies. 
The DNA obtained by sequencing and the amino acid sequences derived from it have also been reproduced in Table 1 . 

25 

Example 3 

Preparation of plasmid plNT41d. 

30 Between the replication origin and the trp-promoter, plasmid plNT41 contains a nonessential DNA region which is 

flanked by cleavage sites for enzyme Nsp(7524)1. To remove this region from the plasmkJ, plNT41 is cut with NSP 
(7524)1 , and the larger of the resulting fragments is isolated and religated. This gives rise to plasmid plNT4ld, the DNA 
sequence of which Is reproduced in Table 2. 

35 
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Table 2: DNA-Sequence of Plasmid pINTAld 



10 30 50 

GTGTCATGGTCGGTGATCGCCAGGGTGCCGACGCGCATCTCGACTTGCACGGTGCACCAA 

70 90 110 

TGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCAC 

130 150 170 

TGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACA 

190 210 230 

TCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGAACTAGT 

250 270 290 

TAACTAGTACGCAAGTTCACGTAAAAAGGGTATCGACCATGGCAACAACATCAACAGCAA 

310 330 350 

CTACGCGTTTCGTGAACCAGCACCTGTGCGGCTCCCACCTAGTGGAAGCTCTCTACCTGG 

370 390 410 

TGTGCGGGGAGCGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCCTC 

430 450 470 

AGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTGGCGC 

490 510 530 

TGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGCACCAGCATCTGCTCCC 

550 570 590 

TCTACCAGCTGGAGAACTACTGCAACTAATAGTCGACCTTTGCTTTCATTGTCGATGATA 

610 630 650 

AGCTGTCAAACATGAGAATTAGCCCGCCTAATGAGCGGGCTTTTTTTTAATTCTTGAAGA 

670 690 710 

CGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCT 

730 750 770 

TAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTC 

790 810 830 

TAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAA 

850 870 890 

TATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTT 

910 930 950 

GCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCT 

970 990 1010 

GAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATC 

1030 1050 1070 

CTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTA 

1090 1110 1130 

TGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACAC 

1150 1170 1190 

TATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGC 

1210 1230 1250 

ATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAAC 

1270 1290 1310 

TTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGG 

1330 1350 1370 

GATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGAC 

1390 1410 1430 

GAGCGTGACACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGC 
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1450 1470 1490 

GAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTT 

1510 1530 1350 

GCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGA 

1570 1590 1610 

GCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC 

1630 1650 1670 

CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAG 

1690 1710 1730 

ATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCA 

1750 1770 1790 

TATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATC 

IBIO 1830 1850 

CTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCA 

1870 1890 1910 

GACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGC 

1930 1950 1970 

TGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTA 

1990 2010 2030 

CCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTT 

2050 2070 2090 

CTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC 

2110 2130 2150 

GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGG 

2170 2190 2210 

TTGGACTCAAGACGATAGTTACCGGTAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGT 

2230 2250 2270 

GCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGC 

2290 2310 2330 

ATTGAGAAAGCGCCACGCTTCCCGAAGGGAGTUVAGGCGGACAGGTATCCGGTAAGCGGCA 

2350 2370 2390 

GGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATA 

2410 2430 2450 

GTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGG 

2470 2490 2510 

GGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCT 

2530 2550 2570 

GGCCTTTTGCTCACATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGC 

TG 



Example 4 

Fermentation and processing of plNT41d-fusion protein 

(i) Fermentation: A shaking culture In LBmedlumlspreparedfromE^W31 10 transformed with plNT41d. Fifteen 
III of this culture, which has an OD = 2 are then put into 15.7 1 TP medium and the suspension is fermented 16 
hours at 37**C. The culture, which at this time has an OD = 13, is then adjusted to 20^g/ml lAA, and until the end 
of fermentation, after another five hours, a 50% (w/v) maltose solution is continuously pumped in at a rate of 100 
ml/hour. An OD = 17.5 is attained in this process. At the end, the bacteria are harvested by centrifugation. 

(ii) Rupture of Cells: The cells are resuspended in 400 ml/disintegration buffer (10 mM tris of pH 8.0, 5 mM EDTA) 
and disrupted in a French press. The fusion protein containing insulin is subsequently concentrated by 30 minutes 
of centrifugation at 23,500 g and washed with disintegration buffer. This yields 134 g sediment (moist substance). 
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(iii) Sulfitolysis: 12.5 g sediment (moist substance) from (il) are stirred into 125 ml of an 8 M urea solution at 35*0. 
After stirring for thirty minutes, the solution is adjusted to pH 9.5 with sodium hydroxide solution and reacted with 
1 g sodium sulfite. After an additional thirty minutes of stirring at 35°C, 0.25 g sodium tetrathionate is added and 
the mixture is again stirred for thirty minutes at 35**C. 

5 

(iv) DEAE-Anion exchange chromatography; The entire batch of (iii) is diluted with 250 ml buffer A (50 mM glycine, 
pH 9.0) and placed on a chromatography column which contains FractogeK^^) TSK DEAE-650 (column volume 
130 ml, column diameter 26 mm) equilibrated with buffer A. After washing with buffer A, the fusion protein-S- 
sulfonate is eluted with a salt gradient consisting of 250 ml each buffer A and buffer B (50 mM glycine of pH 9.0, 

TO 3 M urea and 1 M NaCI) at a flow rate of 3 ml/minute. The fractions containing fusion protein-S-sulfonate are then 

combined. 

(v) Folding and enzymatic cleavage: The combined fractions from (iv) are diluted at A-^C In a volume ratio of 1 + 
9 with folding buffer (50 mM glycine, pH 10.7) and per liter of the resulting dilution 410 mg ascorbic acid and 165 

IS jil 2-mercaptoethanol are added at 4°C under gentle stirring. After correction of the pH value to pH 10.5, stirring 

is continued for another 4 hours at 4°C. Subsequently, solid N-(2-hydroxyethyl)-piperazine-N'-2-ethane sulfonic 
acid (HEPES) is added to an end concentration of 24 g per batch-liter. The mixture which now has pH 8 is digested 
with trypsin at 25^*0. During the process, the enzyme concentration in the digestion mixture is 80 [iq/1 The cleavage 
course is followed analytically by RP-HPLC. After two hours, digestion can be stopped by addition of 1 30 |ig soy 

20 bean trypsin inhibitor. HPLC shows the formation of 19.8 mg di-Arg insulin from a mixture according to (iii). The 

identity of the cleavage product is confirmed by protein sequencing and comparative HPLC with reference sub- 
stances. The di-Arg insulin can be chromatographically purified according to known methods and transformed to 
insulin with carboxypeptidase B. 

25 Example 5 

Construction of plasmid plNT60 

Plasmid pINTSO results in an insulin precursor, the ballast sequence of which consists of only nine amino acids. 
30 For construction of this plasmid, plasmid plNT40 is cut with Nco and Mlul and the resulting vector fragment is isolated. 
The oligonucleotide Insul5 



TTCGGGTACCGTTGTTGTAGTTTGAGTTGCGCAG 5 ' 

35 

TTGCCCATGGC 3' 

is then synthesized, filled in with Klenow polymerase and also cut with these two enzymes. The resulting DNA fragment 
Is then ligated with the vector fragment to yield plasmid plNT60. 
40 Table 1 shows the DNA and amino acid sequence of the ballast constituent in this fusion protein. 

Example 6 

Construction of plasmid plNT67d 

45 

Plasmid plNT67d is a derivative of plNT4ld in which the codon of the amino acid in position nine of the ballast 
sequence is deleted. That is why, like plNT60, it results in an insulin precursor with a ballast sequence of nine amino 
acids. A method according to Ho, S.N. et al. (Gene 77:51-59, 1989) is used for its construction. For this purpose, two 
separate PCR's are first performed with plasmid plNT41d and the two oligonucleotide pairs 

so 



55 
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TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

^ DTR8: 5'-CAC AAA TCG AGT TGC TGT TGA TGT TGT-3 ' 

or 

DTR9: 5'-ACA GCA ACT CGA TTT GTG AAC GAG CAC-3' 

10 and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3 ' • 



IS This produces two fragments that are partially complementary to each other and when annealed with each other code 
a similar insulin precursor as plNT41d in which, however, the amino acid in position nine is absent. For completion, 
the two fragments are combined and subjected to another PGR together with the oligonucleotides TIR and Insull. From 
the DNA fragment obtained in this manner, the structural gene of the insulin precursor is liberated with Nco and Sail 
and purified. Plasmid plNT41 d is then also cut with these two enzymes, the vectorf ragment Is purified and subsequently 

20 ligated with the structural gene fragment from the PGR to yield plasmid plNT67d. 

The nucleotide and amino acid sequences for the ballast region have been reproduced in Table 1. 

Example 7 

25 Construction of plasmid plNT68d 

Like plasmid plNT67d, plasmid plNT68d is a shortened derivative of plasmid plNT4ld In which the codons of the 
two amino acids in positions eight and nine of the ballast sequence are deleted. It therefore results in an insulin precursor 
with a ballast sequence of only eight amino acids. The procedure previously described in Example 6 is used for its 
30 construction but with the two olignonucleotide pairs 

TIR: 5'-CTG 7VAA TGA GCT GTT GAC-3' 

and 

DTRIO: 5'-CAC AAA TCG TGC TGT TGA TGT TGT TGC-3 ' 

or 

DTRll: 5'-TCA ACA GCA CGA TTT GTG AAC CAG CAC-3' 

and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3'. 



35 



40 



45 The nucleotide and amino acid sequences for the ballast region have been reproduced in Table 1. 

Example 8 

Construction of plasmid plNT69d 

so 

Plasmid plNT69d is also a shortened derivative of plasmid plNT4ld in which, however, the codons of the three 
amino acids in positions seven, eight and nine of the ballast sequence have been deleted. It therefore results in an 
insulin precursor with a ballast sequence of only seven amino acids. The procedure described in Example 6 is also 
used for its construction but with the two oligonucleotide pairs 

55 
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70 



15 



TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

DTR12: 5'-CAC AAA TCG TGT TGA TGT TGT TGC CAT-3 ' 
or 

DTR13: 5'-ACA TCA ACA CGA TTT GTG AAC GAG CAC-3 ' 

and 

Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3 ' • 

The nucleotide and amino acid sequences for tlie ballast region have been reproduced In Table 1 . 
Example 9 

Construction of plasmid plNT72d 

20 Plasmid plNT72d is a derivative of plasmid plNT69d in which the entire C-peptide gene region, with the exception 

of the first codon for the amino acid arglnine. Is deleted. Consequently, this results in a "miniproinsulln derivative" with 
an arginine residue instead of a C-chain. With plasmid plNT69d as a starting point, the procedure described in Example 
6 is also used for its construction but with the two oligonucleotide pairs 

25 

TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

Insu28: 5' -GAT GCC GCG GGT CTT GGG TGT-3 ' 
or 

Insu27: 5'-AAG ACC CGC GGC ATC GTG GAG-3' 
and 

35 Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3 ^. 

Example 10 

40 Construction of plasmids plNT73d, plNT88d and plNT89d 

Plasmid plNT73d is a derivative of plasmid plNT69d (Example 8), In which the insulin precursor gene is arranged 
two times in succession. The plasmid therefore results in the formation of a polycistronic mRN A, which can double the 
yield. For its construction, a PGR reaction is carried out with plasmid plNT69d and the two oligonucleotides 



45 



so 



55 



Insu29: 5'-CTA GTA CTC GAG TTC AC-3' 



and 

Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3'. 

This gives rise to a fragment with the insulin precursor gene and the pertinent ribosome binding site which in its 
5'-end region has a cleavage site for enzyme Xhol and in its S'-end region a cleavage site for Sail. The fragment is cut 
with the two above-mentioned enzymes and purified. Plasmid plNT69d is then linearized with Sail, the two DNA ends 
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produced are dephosphorylated with phosphatase (from calf Intestine) and ligated with the fragment from the PGR 
reaction to yield plasmid plNT73d. 

In an analogous manner there are obtained plasmids plNT88d and plNT89d when plasmid plNT72d (Example 9) 
is modified analogously by arranging the "miniproinsulin gene" twice or thrice in sequence. 

5 

Example 11 

Construction of plasmid plNL41d 

10 The starting plasmid pRUD3 has a structure analogous to that of plasmid pGATTR However, instead of the trp- 

promoter region, it contains a tac-promoter region which is flanked by cleavage sites for enzymes EcoRI and Nco. The 
plasmid is cut with ECORI, whereupon the protruding ends of the cleavage site are filled in with Klenow polymerase. 
Cutting is performed subsequently with Nco and the ensuing promoter fragment is isolated. 

The trp-promoter of plasmid plNT4ld is flanked by cleavage sites for enzymes Pvull and Nco. Since the plasmid 

IS has an additional cleavage site for Pvul I, it is completely cut with Nco, but only partially with Pvull. The vector fragment, 
which is missing only the promoter region, is then isolated from the ensuing fragments. This is then ligated with the 
tac-promoter fragment to yield plasmid pINL4ld. 

Example 12 

20 

Construction of plasmid pL41c 

Plasmid pPL-lambda (which can be obtained from Phamriacia) has a lambda-pL-promoter region. The latter is 
flanked by nucleotide sequences: 

2S 

5 ' GATCTCTCACCTACCAAACAAT3 ' 
and 

30 5 ' AGCTAACTGACAGGAGAATCC3 ' . 



Oligonucleotides 

35 

LPL3: 5' ATGAATTCGATCTCTCACCTACCAAACAAT 3' 
and 

LPL4: 5' TTGCCATGGGGATTCTCCTGTCAGTTAGCT 3' 

40 

are prepared for additional flanking of the promoter region with cleavage sites for enzymes EcoRI and Nco. A PCR is 
carried out with these oligonucleotides and pPL-lambda and the resulting promoter fragment is cut with EcoRl and Nco 
and isolated. Plasmid plNL4ld is then also cut with these two enzymes and the ensuing vector fragment, which has no 
45 promoter, is then ligated with the lambda-pL-promoter fragment to yiekJ plasmid pL41c. 

Example 13 

Construction of plasmid pL4ld 

so 

The trp-transcription terminator located between the resistance gene and the fusion protein gene in plasmid pL4lc 
is not effective in E. coli strains that are suitable for fermentation (e.g. E. coli N4830-1 ). For this reason, a polycistronic 
mRNA and with it a large quantity of resistance gene product are formed in fermentation. To prevent this side reaction, 
the trp-terminator sequence is replaced by an effective terminator sequence of the E. coli-rrnB-operon. Plasmid pANG- 
ss MA has a structure similar to that of plasmid plNT41 d, but it has an angiogenin gene instead of the fusion protein gene 
and an rrnB-terminator sequence (from commercial plasmid pKK223-3, which can be obtained from Pharmacia) instead 
of the trp-terminator sequence. The plasmid is cut with Pvul and Sail and the fragment containing the rrnB-terminator 
is isolated. Plasmid pL4lc is then also cut with these two enzymes and the fragment containing the Insulin gene Is 
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isolated. The two isolated fragments are then ligated to yield plasmid pL41d. 



Example 14 



5 Construction of plasmid pINTLI 



To prepare a plasmid tor general use in the expression of fusion proteins, the proinsulin gene of plasmid plNT41d 
is replaced by a polylinker sequence. This gene is flanked by cleavage sites for enzymes Mlul and Sail. The plasmid 
is therefore cut with the help of the two above-mentioned enzymes and the vector fragment is isolated. This is then 
10 ligated, to yield plasmid pINTLI, with the following two synthetic oligonucleotides 



BstEII AccI EcoRI Kpnl BamHI 

5 ' CGCGCCTGGTTACCTCGAGGTATACTACGAATTCGAGCTCGGTACCCGCGGATCC 
3 ' GGACCAATGGAGCTCCATATGATGCTTAAGCTCGAGCCATGGGCCCCTAGG 
Xhol Saci Xmal 



20 

SphI Xbal 
CTGCAGGCATGCAAGCTTGTCT AG AC - 3 ' 
GACGTCCGTACGTTCGAACAGATCTGAGCT-5 ' 
PstI Hindlll (Sail). 

2S 

Example 15 

Insertion of a gene coding for HMG CoA-reductase (active domain) in pINTLI and expression of the fusion protein 

30 

Table 3 represents the DNA and amino acid sequence of the gene HMG CoA-reductase. The synthetic gene for 
HMG CoA-reductase known from EP-A O 292 803 (herein incorporated by reference) contains a cleavage site for 
BstEII in the region of amino acids Leu and N^l in positions 3 and 4 (see Table 3). A protruding sequence corresponding 
to enzyme Xbal occurs at the end of the gene (in the noncoding area). The corresponding cleavage sites in the polylinker 

35 of plasmid pINTLI are in the same reading frame. Both cleavage sites are in each case singular 

Plasmid pUHIO contains the complete HMG gene (HMG fragments I. II, III. and IV), corresponding to the DNA 
sequence of table 3. Construction of pUHl 0 (figure 2) is described in EP-A 0 292 803 herein incorporated by reference. 
Briefly, special plasmids are prepared for the subcloning of the gene fragments HMG I to H MG IV and for the construction 
of the complete gene. These plasmids are derived from the commercially available vectors pUCIB, pUC19 and 

40 M1 3mp18 or M13mp19, with the polylinker region having been replaced by a new synthetic polylinker corresponding 
to DNA sequence VI 



Nco EcoRI Hindu I BamHI Xbal 

45 

Vl-la 

5' AAT TGC CAT GGG CAT GCG GAA TIC CAA GCT TTG GAT CCA TCT AGA GGG 

3' CG GTA CCC GTA CGC CTT AAG GTT CGA AAC CTA CGT AGA TCT CCC TCG A 

Vl-lb 



SO These new plasmids have the advantage that, in contrast to the pUC and Ml 3mp plasmids, they allow the cloning 
of DNA fragments having the protruding sequences for the restriction enzyme Nco. Moreover, the recognition sequenc- 
es for the cleavage sites Nco, EcoRI, Hindlll, BamHI, and Xbal are contained in the vectors in exactly the sequence 
in which they are present in the complete gene HMG, which facilitates the sequential cloning and the construction of 
this gene. Thus It is possible to subclone the gene fragments HMG I to HMG IV in the novel plasmids. After the gene 

ss fragments have been amplified, it is possible for the latter to be combined to give the complete gene (see below). 
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a. Preparation of vectors which contain DNA sequence VI 

DNA sequence VI may be prepared by standard techniques. The commercially available plasmid pUCIS (or 
pUC19, M13mp18 or M13mp19) is opened with the restriction enzymes EcoRI/HindIM as stated by the manufacturer 

5 The digestion mixture is fractionated by electrophoresis on a 1% agarose gel. The plasmid bands which have been 
visualized by ethidium bromide staining are cut out and eluted from the agarose by electrophoresis. 20 fmol of the 
residual plasmid thus obtained are then ligated with 200 fmol of the DNA fragment corresponding to DNA sequence 
VI at room temperature overnight. A new cloning vector pSU18 (or pSU19. M13mUS18 or M13mUS19) is obtained. 
In contrast to the commercially available starting plasmids, the new plamids can be cut with the restriction enzyme 

10 Nco. The restriction enzymes EcoRI and Hindlll likewise cut the plasmids only once because the polylinker which is 
inserted via the EcoRI and Hindlll cleavage sites destroys these cleavage sites which are originally present. 

b. Preparation of the hybrid plasmids which contain the gene fragments HMG I to HMG IV. 

IS i) Plasmid containing the gene fragment HMG I 

The plasmid pSU1 8 is cut open with the restriction enzymes EcoRI and Nco in analogy to the description in Example 
15 (a) above, and is ligated with the gene fragment I which has previously been phosphorylated. 

20 ii) Plasmid containing the gene fragment HMG II 

The plasmids with the gene subfragments HMG 11-1, H-2 and 11-3 are subjected to restriction enzyme digestion 
with EcoRI/Mlul, Mlul/BssHII or BssHII/Hindlli to isolate the gene fragments HMG 11-1, HMG 11-2 or HMG 11-3, respec- 
tively The latter are then ligated in a known manner into the plasmid pSUI 8 which has been opened with EcoRI/Hlndlll. 

25 

iii) Plasmid containing the gene fragment HMG III 

The plasmids with the gene subfragments HMG III-1 and III-3 are digested with the restriction enzymes EcoRI/ 
Hindlll and then cut with Sau96t to isolate the gene fragment HMG III-1, or with BamHI/Banll to isolate the gene 
30 fragment HMG III-3. These fragments can be inserted with the HMG III-2 fragment into a pSU18 plasmid which has 
been opened with Hindlll/BamHI. 

iv) Plasmid containing the gene fragment HMG IV 

35 The plasmids with the gene subfragments HMG IV(1+2) and IV-(3+4) are opened with the restriction enzymes 

EcoRI/BamHI and EcoRI/Xbal, respectively, and the gene fragments HMG IV-(1+2) and HMG IV-(3+4) are purified by 
electrophoresis. The resulting fragments are then ligated into a pSU18 plasmid which has been opened with BamHI/ 
Xbal and in which the EcoRI cleavage site has previously been destroyed with SI nuclease as described below A 
hybrid plasmid which still contains an additional AATT nucleotide sequence in the DNA sequence IV is obtained. The 

40 hybrid plasmid is opened at this point by digestion with the restriction enzyme EcoRI, and the protruding AATT ends 
are removed with SI nuclease. For this purpose, 1 )xg of plasmid is, after EcoRI digestion, incubated with 2 units of 
81 nuclease in 50 mM sodium acetate buffer (pH 4.5), containing 200 mM NaCI and 1 mM zinc chloride, at 20^C for 
30 minutes. The plasmid is recircularized in a known manner via the blunt ends. A hybrid plasmid which contains the 
gene fragment IV is obtained. 

45 

c. Constructbn of the hvbrid plasmid pUHIO which contains the DNA sequence V 

The hybrid plasmid with the gene fragment HMG I is opened with EcoRI/Hindlll and ligated with the fragment HMG 
II which is obtained by restriction enzyme digestion of the corresponding hybrid plasmid with EcoRI/Hindlll. The re- 

so suiting plasmid Is then opened with Hindlll/BamHI and ligated with the fragment HMG III which can be obtained from 
the corresponding plasmid using Hindlll/BamHI. The plasmid obtained in this way is in turn opened with BamHI/Xbal 
and linked to the fragment HMG IV which is obtained by digestion of the corresponding plasmid with BamHI/Xbal. The 
hybrid plasmid pUHl 0 which contains the complete HMG gene, corresponding to DNA sequence V, is obtained. Figure 
2 shows the map of pUHIO diagrammatically, with "ori" and "Ap^ indicating the orientation in the residual plasmid 

55 corresponding to pUCI 8. 

If pINTLI is cut with BstEII and Xbal and the large fragment is isolated, and if, on the other hand, plasmid pUHIO 
(figure 2) is digested with the same enzymes and the fragment which encompasses most of the DNA sequence V from 
this plasmid is isolated, after ligation of the two fragments we obtain a plasmid which codes a fusbn protein in which 
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arginine follows the first eight amino acids in the ballast sequence of plNT41 d (Table 1 ), which is followed, starting with 
Leu^, by the structural gene of the active donnain of HMG CoA-reductase. For purposes of comparison, the two initial 
plasmids are cut with enzymes Nco and Xbal and the corresponding fragments are ligated together, yielding a plasmid 
which codes, immediately after the start codon, the active domain of HMG CoA-reductase (in accordance with DNA 

5 sequence V of EP-A 0 292 803, see table 3). 

Expression of the coded proteins occurs according to Example 4. Following the breakup of the cells, centrif ugation 
is performed whereupon the expected protein of approximately 55 kDa is determined in the supernatant by gel elec- 
trophoresis. The band for the fusion protein is much more intensive here than for the protein expressed directly Indi- 
vidual portions of 100 nl of the supernatant are tested in undiluted form, in a dilution of 1 :10 and in a dilution of 1 : 100 

10 for the formation of mevalonate. As an additional comparison, the fusion protein according to Example 4 (fusion protein 
with proinsulin constituent) is tested; no activity is apparent in any of the three concentrations. The fusion protein with 
the HMG CoA-reductase constituent exhibits maximum activity in all three dilutions, while the product of the direct 
expression shows graduated activity governed by the concentration. This indicates better expression of the fusion 
protein by a factor of at least 100. 

IS 



20 



25 
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Example 16 

Construction of plasmid pB70 

5 Plasmid plNT41d is split with Mlul and Sail and the large fragment is isolated. Plasmid plK4 shown in figure 3a 

contains a gene for "mini-proinsulin." the C chain of which consists of arginine only. 

The construction of this plasmid has previously been described in EP-A 0.347,781 (herein incorporated by refer- 
ence). Briefly, the commercial plasmid pUC19 is opened using the restriction enzymes Kpnl and PstI and the large 
fragment (figure 3-(1)) is separated through a 0.8% strength "Seaplaque" gel. This fragment is reacted with T4 DNA 

10 ligase using the DNA (figure 3-(2)) synthesized according to Table 4. Table 4 shows the sequence of gene fragment 
IK I. while table 5 represents the sequence of gene fragment IK II. 

This ligation mixture then is incubated with competent E. coii 79/02 cells. The transformation mixture is plated out 
on IPTG/Xgal plates which contain 20 mg/1 of ampicillin. The plasmid DNA is isolated from the white colonies and 
characterized by restriction and DNA sequence analysis. The desired plasmids are called pIKI (figure 3). 

IS Accordingly, the DNA (figure 3-(5)) according to Table 5 is ligated into pUCI 9 which has been opened using PstI 

and Hindlll (figure 3-(4)). The plasmid plK2 (figure 3) is obtained. 

The DNA sequences (2) and (5) of figure 3 according to Table 4 and 5 are reisolated from the plasmids plK1 and 
plK2 and ligated with pUC19. which has been opened using Kpnl and Hindlll (figure 3-(7)). The plasmid plK3 (figure 
3) is thus obtained which encodes for a modified human insulin sequence. 

20 The plasmid plK3 is opened using Mlul and Spel and the large fragment (figure 3a-(9)) Is Isolated. This is ligated 

with the DNA sequence (10) 



B30 Al A2 A3 A4 AS A6 A7 A8 A9 

25 (Thr)(Arg) Gly lie Val Glu Gin Cys Cys (Thr) (Ser) (10) 

5' CG CGT GGT ATC GTT GAA CAA TGT TGT A 3' 

3' A CCA TAG CAA CTT GTT ACA ACA TGA TO 5' 

(Mlul) (Spel) 

30 which supplements the last codon of the B chain (B30) by one arginine codon and replaces the excised codon for the 
first 7 amino acids of the A chain and supplements the codon for the amino acids 8 and 9 of this chain. The plasmid 
plK4 (figure 3a) is thus obtained which encodes for human mini-proinsulin. 

In tables 4 and 5, the B- and A-chains of the insulin molecule are in each case indicated by the first and last amino 
acid. Next to the coding region in gene fragment IK II, there is a cleavage site for Sail which will be utilized in the 
35 following construction. 

Plasmid plK4 is cut with Hpal and Sail and the gene coding "mini-proinsulin" is isolated. This gene is ligated with 
the above-mentioned large fragment of plNT41d and the following synthetic DNA sequence. 





(Thr) 


Arg 


Met 


Gly 


Arg 


Phe 






CG 


CGT 


ATG 


GGC 


CGT 


TTC 


GTT 


45 




A 


TAC 


CCG 


GCA 


AAG 


CAA 



(Mlul) (Hpal) 



so This gives rise to plasmid pB70, which codes a fusion protein in which the ballast sequence (Table 1 . line 1 ) is followed 
by amino acid sequence Met-Gly-Arg which is followed by the amino acid sequence of the "mini-proinsulin". 
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TABLE 4: Gene fragment IK I (2) 



Phe 

10 20 30 40 

< 2 

CT TTG GAC AAG AGA TTC GTT AAC CAA CAC TTG TGT GGT TCT CAC 
CAT GGA AAC CTG TTC TCT AAG CAA TTG GTT GTG AAC ACA CCA AGA GTG 

< 1 > 

(Kpnl) Hpal 

50 60 70 80 90 

— > < 4 

TTG GTG GAA GCG TTG TAG TTG GTT TGT GGT GAG CGT GGT TTC TTC 
AAC CAC CTT CGC AAC ATG AAC CAA ACA CCA CTC GCA CCA AAG AAG 
< 3 

B~ 

Thr Arg Lya Gly Ser Leu 
100 110 120 



TAC ACT CCA AAG ACG CGT AAG GGT TCT CTG CA 

ATG TGA GGT TTC TGC GCA TTC CCA AGA G 



Mlul (PstI) 
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10 



15 



20 



Table 5: Gene fragment IK II (5) 

Gin Lys Arg Gly 

130 140 150 160 

♦ • • • 

< 6 

G AA6 CCT GGT ATC GTT GAA CAA TGT TGT ACT AGT ATC TGT TCT 
AC GTC TTC GCA CCA TAG CAA CTT GTT ACA ACA TGA TCA TAG ACA AGA 

< 5 

(PstI) spel 

A^' 
Asn 

170 180 190 200 210 



> 

TTG TAC CAG CTG GAA AAC TAC TGT AAC TGA TAG TCG ACC CAT GGA 
25 AAC ATG GTC GAC CTT TTG ATG ACA TTG ACT ACT AGC TGG GTA CCT TCG A 

> 

(Hindlll) 

30 Example 17 

By using the oligonucleotides listed below there are obtained plasmlds plNT90d to plNT96d in analogy to the 
previous examples. An asterisk indicates the same encoded amino acid in the ballast constituent as in plNT4ld. 

plNT92 encodes a double mutation in the insulin derivative encoded by the plasmid plNT72d since the codon for 
35 Arg at the end of the ballast constituent and In the "mini 0 chain" is substituted by the codon for Met. Thus the expressed 
preproduct can be cleaved with cyanogen bromide. 



40 



45 



50 



55 



25 



EP 0 489 780 B1 



pINTSOd: ******GNSA* (variant of pINT69d) 
TIR: 5 ' -CTGAAATGAGCTGTTGAC-3 
and 

Insu5 0 : 5 ' -TGCCGAATTTCCTGTTGATGTTGTTGC-3 
or 

Insu4 9 : 5 ' -GGAAATTCGGCACGATTTGTGAACCAG-3 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT91d: ******GNSA* (variant of pINT72d) 
TIR: 5 ' -CTGAAATGAGCTGTTGAC-3 ' 
and 

InsuSO : 5 ' -TGCCGAATTTCCTGTTGATGTTGTTGC-3 

or 

Insu49 : 5 ' -GGAAATTCGGCACGATTTGTGAACCAG-3 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



26 



EP0 489 780B1 



pINT92d: (double mutant of pINT72d) 

Insu5 6 : 5 ' -TCGACCATGGCAACAACATCAACAATGTTTGTG-3 
and 

Insu58 : 5 ' -GATGCCCATGGTCTT-3 ' 
or 

Insu57 : 5 ' -AAGACCATGGGCATC-3 ' 
and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT93d: ****** (variant of pINT68d) 

Insu5 3 : 5 ' -ACCATGGCAACAACATCAACAAAACGATTTGTG-3 ' 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT94d: ****** (variant of pINT68d) 

Insu54 : 5 ' -ACCATGGCAACAACATCAACACCACGATTTGTG-3 ' 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT95d: ****** (variant of pINT68d) 

Insu55 : 5 ' -TCGACCATGGCAACAACATCAACAATGCGATTTGTG-3 ' 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT96d: ****** (variant of pINT68d) 

Insu7 1 : 5 ' -ACCATGGCAACAACATCAACAGGACGATTTGTG-3 ' 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



Claims 

1. A process for the preparation pf fusion proteins, which fusion proteins contain a desired protein and a ballast 
constituent, which process comprises 
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(a) constructing a mixed oligonucleotide which codes for the said ballast constituent, wherein the said oligo- 
nucleotide contains the DNA sequence (coding strand) 



^ (DCD), 
in which D is A, G or T and x is 4 to 12. 

(b) inserting the said mixed oligonucleotide into a vector so that It is functionally linked to a regulatory region 
10 and to the structural gene coding for the said desired protein. 

(c) transforming host cells with the so-obtained vector population and 

(d) selecting from the transformants one or more clones expressing a fusion protein in high yield. 

2. The process as claimed in claim 1 , wherein the said ollgonucleodie codes at its 3' end of the coding strand for an 
IS amino acid or for a group of amino acids which allows an easy cleavage of the said desired protein from the said 

ballast constituent. 

3. The process as claimed in claim 2, wherein said cleavage is an enzymatic cleavage. 

20 4. The process as claimed in claim 1 , wherein the said oligonucleotide is designed so that it leads to a fusion protein 
which is soluble or which easily can be solubilized. 

5. The process as claimed in claim 1 , wherein the said oligonucleotide Is designed so that the ballast constituent 
does not interfere with folding of the said desired protein. 



2S 



30 



6. The process as claimed in claim 1 , wherein x is 4 to 8. 

7. The process as claimed in claim 5, wherein the said oligonucleotide has the sequence (coding strand) 

ATG (DCDL (NNN)^ 



3S Wherein N In the NNN triplet stands for identical or different nucleotides, excluding stop codons for NNN, z is 1 to 

4 and y + z is 6 to 1 2, y being at least 4. 

8. The process as claimed in claim 7, wherein y + z is 6 to 1 0. 

40 9. The process as claimed in claim 7, wherein y is 5 to 8 and z is 1 . 

10. The process as claimed in claim 1 , wherein the said oligonucleotide has the sequence (coding strand) 



45 



so 



ss 



ATG GCW (DCD)4.a CGW 

in which W is A or T 
Patentanspruche 

1 , Verfahren zur Herstellung von Fusionsproteinen, welche Fusionsproteine ein gewunschtes Protein und einen Bal- 
lastbestandteil enthalten, welches Verfahren umfaBt 

(a) das Konstruieren eines gemischten Oligonucleotids, welches fur den genannten Ballastbestandteil codiert, 
wobei das genannte Oligonucleotid die DNA-Sequenz (codierender Strang) 
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(DCD)x 

enthalt. worin D fur A. G Oder T steht und x von 4 bis 1 2 betragt; 
5 (b) das Insertieren des genannten gemischten Oligonucleotids In einen Vektor, so daB dieses an eine regu- 

latorische Region und an das Strukturgen, welches fur das gewunsclite Protein codierl, funktionell gebunden 
1st: 

(c) das Transformieren der Wirtszellen mil der so erhaltenen Vektorpopulation; und 

(d) das Selektieren von einem Oder mehreren Klonen. welche ein Fusionsprotein in hoher Ausbeute exprimie- 
70 ren, aus den Transfomnanten. 

2. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid an seinem 3'-Ende des codierenden Stranges fur 
eine Aminosaure oder eine Gruppe von Aminosauren codiert, wodurcli eine leiclite Spaltung des gewunschten 
Proteins von dem genannten Ballastbestandteil ermoglicht wird. 



IS 



20 



2S 



30 



3S 



3. Verfahren nach Anspruch 2, worin die genannte Spaltung eine enzymatische Spaltung ist. 

4. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid so ausgestaltet ist, daB es zu einem Fusionsprotein 
fuhrt, welches loslich ist oder welches leicht solubilisiert werden kann. 

5. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid so ausgestaltet ist. daB der Ballastbestandteil die 
Faltung des genannten gewunschten Proteins nicht beeintrachtigt. 

6. Verfahren nach Anspruch 1 , worin x von 4 bis 8 betragt. 

7. Verfahren nach Anspruch 5. worin das genannte Oligonucleotid die Sequenz (codierender Strang) 

ATG(DCD) y (NNN) ^ 

besitzt. worin N im NNN-Triplett fur identische oder verschiedene Nukleotide steht, wobei Stopcodons fur NNN 
ausgeschlossen sind, z von 1 bis 4 betragt und y+z von 6 bis 12 betragt, wobei y mindestens 4 Ist. 

8. Verfahren nach Anspruch 7, worin y+z von 6 bis 10 betragt. 

9. Verfahren nach Anspruch 7, worin y von 5 bis 8 betragt und z 1 ist. 

10. Verfahren nach Anspruch 1, worin das genannte Oligonucleotid die Sequenz (codierender Strang) 

ATG GCW {DCD)4-8 CGW 

besitzt. worin W A oder T ist. 
Revendications 

1. Proc6d6 pour la preparation de prot6ines de fusion, lesquelles prot6ines de fusion contiennent une prot6ine re- 
cherch6e et un constituant de lestage, lequel proc6d6 comprend 

(a) la construction d'un oligonucleotide mixte codant pour (edit constituant de lestage, ledit oligonucleotide 
contenant la sequence d'ADN (brin codant) 

dans laquelle D est A, G ou T et x va de 4 ^ 12. 

(b) I'insertion dudit oligonucleotide mixte dans un vecteur, de manl6re qu'il soit fonctionnellement M ^ une 



40 



45 



so 



29 



EP 0 489 780 B1 

region regulatrice et au gene structural codant pour ladite prot6ine recherch^e, 

(c) la transformation de cellules hotes par la population de vecteurs ainsi obtenue et 

(d) ia selection, k partir des transtormants, d'un ou plusleurs clones exprimant une prot^ine de tuslon avec un 
rendement eleve. 

5 

2. Proc6d6 selon la revendication 1 . dans lequel ledit oligonucleotide code k son extr6mlt6 3' du brin codant pour un 
aminoaclde ou pour un groupe d'amlnoacides permetlant une separation ais6e de ladite prot6ine recherch6e 
d'avec ledit constituant de lestage. 

10 3. Proc6d§ selon la revendication 2, dans lequel ladite separation est une coupure enzymatlque. 

4. Procede selon la revendication 1 , dans lequel ledit oligonucleotide est congu de manlere k conduire k une proteine 
de fusion qui est soluble ou qui peut etre aisement solubilisee. 

IS 5. Precede selon la revendication 1, dans lequel ledit oligonucleotide est con9U de maniere que le constituant de 
lestage n'intertere pas avec le repliement de ladite proteine recherchee. 

6. Precede sebn la revendication 1 , dans lequel x va de 4 ^ 8. 

20 7. Precede selon la revendication 5, dans lequel ledit oligonucleotide comporte la sequence (brin codant) 

ATG (DCD)„ (NNN) ^ 

y ^ 

25 dans laquelle N dans le triplet NNN represente des nucleotides identlques ou differents, k {'exclusion des codons 

d'arrSt pour NNN, z va de 1 a 4 et y + z va de 6 a 1 2, y etant au molns egal a 4. 

a Precede sebn la revendication 7, dans lequel y + z va de 6 ^ 1 0. 

30 9. Precede selon la revendication 7, dans lequel y va de 5 ^ 8 et z est egal k 1 . 

10. Procede selon la revendication 1 , dans lequel ledit oligonucleotide comporte la sequence (brin codant) 

ATG GCW (DCD)^ ^ CGW 

35 4-8 

dans laquelle W est A ou T. 
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Xbal 




FIG. 2 
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